TruthfulQA: Measuring How Models Mimic Human Falsehoods
https://arxiv.org/abs/2109.07958
The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics.
We crafted questions that some humans would answer falsely due to a false belief or misconception.
https://github.com/sylinrl/TruthfulQA